conversational ability
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhang, Zhengze, Wang, Shiqi, Shen, Yiqun, Guo, Simin, Lin, Dahua, Wang, Xiaoliang, Cam-Tu, Nguyen, Tan, Fei
Large language models (LLMs) have demonstrated exceptional performance across various applications, but their conversational abilities decline sharply as model size decreases, presenting a barrier to their deployment in resource-constrained environments. Knowledge distillation with Direct Preference Optimization (dDPO) has emerged as a promising approach to enhancing the conversational abilities of smaller models using a larger teacher model. However, current methods primarily focus on 'black-box' KD, which only uses the teacher's responses, overlooking the output distribution offered by the teacher. This paper addresses this gap by introducing daDPO (Distribution-Aware DPO), a unified method for preference optimization and distribution-based distillation. We provide rigorous theoretical analysis and empirical validation, showing that daDPO outperforms existing methods in restoring performance for pruned models and enhancing smaller LLM models. Notably, in in-domain evaluation, our method enables a 20% pruned Vicuna1.5-7B to achieve near-teacher performance (-7.3% preference rate compared to that of dDPO's -31%), and allows Qwen2.5-1.5B to occasionally outperform its 7B teacher model (14.0% win rate).
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > China > Hong Kong (0.04)
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models
Li, Jijie, Du, Li, Zhao, Hanyu, Zhang, Bo-wen, Wang, Liangdong, Gao, Boyan, Liu, Guang, Lin, Yonghua
Large Language Models (LLMs) demonstrate strong performance in real-world applications, yet existing open-source instruction datasets often concentrate on narrow domains, such as mathematics or coding, limiting generalization and widening the gap with proprietary models. To bridge this gap, we introduce Infinity-Instruct, a high-quality instruction dataset designed to enhance both foundational and chat capabilities of LLMs through a two-phase pipeline. In Phase 1, we curate 7.4M high-quality foundational instructions (InfInstruct-F-7.4M) from over 100M samples using hybrid data selection techniques. In Phase 2, we synthesize 1.5M high-quality chat instructions (InfInstruct-G-1.5M) through a two-stage process involving instruction selection, evolution, and diagnostic filtering. We empirically evaluate Infinity-Instruct by fine-tuning several open-source models, including Mistral, LLaMA, Qwen, and Yi, and observe substantial performance gains across both foundational and instruction following benchmarks, consistently surpassing official instruction-tuned counterparts. Notably, InfInstruct-LLaMA3.1-70B outperforms GPT-4-0314 by 8.6\% on instruction following tasks while achieving comparable foundational performance. These results underscore the synergy between foundational and chat training and offer new insights into holistic LLM development. Our dataset\footnote{https://huggingface.co/datasets/BAAI/Infinity-Instruct} and codes\footnote{https://gitee.com/li-touch/infinity-instruct} have been publicly released.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Beijing > Beijing (0.04)
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Lee, Janghwan, Park, Seongmin, Hong, Sukjin, Kim, Minsoo, Chang, Du-Seong, Choi, Jungwook
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs.
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
Zheng, Chen, Sun, Ke, Wu, Hang, Xi, Chenguang, Zhou, Xun
In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user preferences, inadvertently increasing the generation of toxic outputs when specifically prompted. To overcome these challenges, we adopted an innovative approach by completely bypassing SFT and directly implementing Harmless Reinforcement Learning from Human Feedback (RLHF). Our method not only preserves the base model's general capabilities but also significantly enhances its conversational abilities, while notably reducing the generation of toxic outputs. Our approach holds significant implications for fields that demand a nuanced understanding and generation of responses, such as customer service. We applied this methodology to Mistral, the most popular base model, thereby creating Mistral-Plus. Our validation across 11 general tasks demonstrates that Mistral-Plus outperforms similarly sized open-source base models and their corresponding instruct versions. Importantly, the conversational abilities of Mistral-Plus were significantly improved, indicating a substantial advancement over traditional SFT models in both safety and user preference alignment.
- North America > United States > District of Columbia > Washington (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > United States > Pennsylvania (0.04)
- (15 more...)
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Tu, Quan, Fan, Shilong, Tian, Zihang, Yan, Rui
Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA assessment, complemented by a tailored high-quality dataset. The dataset comprises 1,785 multi-turn role-playing dialogues, encompassing 23,020 examples and featuring 77 characters derived from Chinese novels and scripts. It was carefully constructed, beginning with initial dialogue extraction via GPT-4, followed by rigorous human-led quality control, and enhanced with in-depth character profiles sourced from Baidu Baike. CharacterEval employs a multifaceted evaluation approach, encompassing thirteen targeted metrics on four dimensions. Comprehensive experiments on CharacterEval demonstrate that Chinese LLMs exhibit more promising capabilities than GPT-4 in Chinese role-playing conversation. Source code, data source and reward model will be publicly accessible at https://github.com/morecry/CharacterEval.
Could a chatbot write my restaurant reviews?
One afternoon an email arrives that threatens to end my career. Or at the very least, it makes me think seriously about what the end of my career might look like. It comes from a woman in Ely called Camden Woollven who has an interest in my restaurant reviews, a taste for the absurd and perhaps just a little too much time on her hands. Woollven works in the tech sector and has long been fascinated by OpenAI, a company founded in 2015, with investment from among others Elon Musk, to develop user-friendly applications involving artificial intelligence. In November last year, after $10bn worth of investment from Microsoft, OpenAI released ChatGPT3, a tool which has been trained on a vast array of data and allows us to commission articles and have human-like text conversations with a chatbot.
4 KPIs to Improve Chatbot Accuracy and its Conversational Abilities
Just in case you imagine that all chatbots are designed similarly, you're shockingly off base. Chatbots today come in all shapes and measures and have varying capacities. While fundamental chatbots might be satisfactory for handling basic operations, to improve the customer experience at an enterprise-level you need advanced virtual assistants that are able to understand user sentiments and carry out human-like interactions round-the-clock across all channels. On the other side, don't go over the edge and construct an intricate chatbot with AI abilities that can compete well in the market. Having a good chatbot doesn't ensure success.
A Journey Through the History of Chatbots With CTO Avi Ben Ezra – TechieStuffs
The Turing test questions machines or AI's capability to exhibit intelligent behavior that is equal or indistinguish able from Human behavior. The accuracy of the response does not only measure the machine's success but the tone that it is delivered through it. It is a test in which a human tester determines if a machine is equal to humans in their conversational skills. AI has advanced a lot since the inception of this test, and numerous software has already passed it. Currently, it is the standard for all the best chatbots.
How to Use Deep Learning to Clone Yourself as a Chatbot (Replika Review) Lionbridge AI
Chatbots are one of the most common applications of natural language processing and machine learning. Replika AI has created a platform where anyone, including people with zero knowledge of machine learning, can create and train a chatbot of their own. After the tragic death of her best friend, Eugenia Kuyda (Founder of Luka inc.) used the text message and email history of her friend to recreate him as a chatbot. The feedback from other friends and family inspired her to expand the project and create Replika AI, a chatbot users train themselves. Numerous companies utilize chatbots for customer interactions, and thus chatbot training data is one of the most in-demand services in the AI industry today.
Does Conversation Hurt Or Help The Chatbot UX? – Smashing Magazine
Chatbot fever has infected Silicon Valley. The leaders of virtually every tech giant -- including Facebook, Google, Amazon and Apple -- proclaim chatbots as the new websites, and messaging platforms as the new browsers. "You should message a business just the way you would message a friend," declared Mark Zuckerberg when he launched the Facebook Messenger Platform for bots. He and the rest of the tech world are convinced that conversation is the future of business. But is chatting actually good for bots?
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)